home *** CD-ROM | disk | FTP | other *** search
- Path: newsfeed.direct.ca!usenet
- From: qjackson@direct.ca
- Newsgroups: comp.lang.c,comp.lang.c++,comp.std.c
- Subject: Re: Problem: Parsing Algorithms????????
- Date: Thu, 15 Feb 1996 00:23:18 GMT
- Organization: Parsepolis Software
- Message-ID: <4ftud4$gnt@aphex.direct.ca>
- References: <4535421196525ntc@compuserve.com>
- Reply-To: qjackson@direct.ca
- NNTP-Posting-Host: 204.174.249.1
- X-Newsreader: Forte Free Agent 1.0.82
-
- Matthew Dougherty <76477.1267@compuserve.com> wrote:
-
- >I am writing an ANSI C language program to Parse name, address, phone,
- >email, and a couple of other fields from text resumes. The idea is to
- >have resumes that are emailed to be automatically entered into a database.
-
- Probably the closest thing you're going to find in the C world is lex,
- awk, sed, or perl. Lex would be the most easily integrated, but it
- suffers from the fact that it is static in nature (ie. search patterns
- cannot modify themselves once they have been compiled by lex into C
- code).
-
- >For street address there are key words like APT. ST. etc.
- >ZipCodes are easy to find and prove because there are limited
- >possibilities for states or state codes,
-
- >Phone numbers are easy to find.
-
- >Names are difficult. It's basically positional. It can be different
- >every time. The name can be alone on a line or on the same line as phone
- >or something.
-
- >Ideas are appreciated.
-
- I am currently working on a <standard> C++ port of LPM, an interpreted
- language for pattern matching that I originally implemented in a
- non-C/C++ language. It will include C wrappers to allow it to be
- called from ANSI C code. The port is ~40% complete now.
-
- LPM allows you to scan a string for patterns rather than string
- literals. For instance, to find a legal North American phone number
- in a given target string, one would use the rule:
-
- [@(
- [@'('$
- [3'0-9'#
- [@'-)'#
- [)
- [3'0-9'#
- ['-'$
- [4'0-9'#
-
- This (might) be expressed as the following RE:
-
- (\(?[0-9]{3}[-)]?)?[0-9]{3}-[0-9]{4}
-
- (Yes, yes, what a nightmare!)
-
- A more robust rule would be required to scan a string for a person's
- name, but using LPM, it can be done. (For example, I have a program
- that uses LPM to find nouns in a text file based upon their context
- within a sentence.)
-
- If you'd like more information on LPM, just email me.
-
-
- Cheers,
-
-
-
-
- --
- |
- Parsepolis Software | Quinn Tyler Jackson
- "ParseCity" | (aka 'Jamshid')
- >--------------------------| qjackson@direct.ca
- |---------------------->
-
-